Learning policies for partially observable environments : Scaling upMichael
نویسندگان
چکیده
Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems , existing techniques for nding optimal behavior do not appear to scale well and have been unable to nd satisfactory policies for problems with more than a dozen states. After a brief review of pomdp's, this paper discusses several simple solution methods and shows that all are capable of nding near-optimal policies for a selection of extremely small pomdp's taken from the learning literature. In contrast, we show that none are able to solve a slightly larger and noisier problem based on robot navigation. We nd that a combination of two novel approaches performs well on these problems and suggest methods for scaling to even larger and more complicated domains.
منابع مشابه
Learning Policies for Partially Observable Environments: Scaling Up
Partially observable Markov decision processes (pomdp's) model decision problems in which an agent tries to maximize its reward in the face of limited and/or noisy sensor feedback. While the study of pomdp's is motivated by a need to address realistic problems, existing techniques for nding optimal behavior do not appear to scale well and have been unable to nd satisfactory policies for problem...
متن کاملScaling Internal-State Policy-Gradient Methods for POMDPs
Policy-gradient methods have received increased attention recently as a mechanism for learning to act in partially observable environments. They have shown promise for problems admitting memoryless policies but have been less successful when memory is required. In this paper we develop several improved algorithms for learning policies with memory in an infinite-horizon setting — directly when a...
متن کاملA POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملLearning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments
This paper presents a probabilistic framework for learning decentralized control policies for cooperative multiagent systems operating in a large partially observable stochastic environment based on batch data (trajectories). In decentralized domains, because of communication limitations, the agents cannot share their entire belief states, so execution must proceed based on local information. D...
متن کاملApprenticeship Learning for Model Parameters of Partially Observable Environments
We consider apprenticeship learning — i.e., having an agent learn a task by observing an expert demonstrating the task — in a partially observable environment when the model of the environment is uncertain. This setting is useful in applications where the explicit modeling of the environment is difficult, such as a dialogue system. We show that we can extract information about the environment m...
متن کامل